AWS for Data Science
Data Sources
Data Storage
- Amazon RDS: transactional data (AWS Storage & Databases#Amazon Relational Database Service (Amazon RDS))
- Amazon S3: file-based data (AWS Storage & Databases#Amazon Simple Storage Service (Amazon S3))
- Amazon Redshift: analytical data (AWS Storage & Databases#Amazon Redshift)
- Amazon DynamoDB: non-relational data (AWS Storage & Databases#Amazon DynamoDB)
Data Ingestion & Processing
- Batch processing: Processes massive amounts of data at once
- AWS Lambda (AWS All-in-one#^08bc66)
- AWS Glue Data Catalog
- = ETL service for data management
- register data: create reference to data
- only stores metadata/schema in the catalog, no data is moved
- Amazon EMR
- Hadoop and Spark as a service, running at any scale with no complex setup.
- Stream processing: Processes tiny bursts of data continuously
- Amazon Kinesis
- Amazon Kinesis Data Firehose:
- capture, transform, and load data streams into AWS data stores for near real-time analytics with existing business intelligence tools.
- Amazon Kinesis Data Streams:
- build custom, real-time applications that process data streams using popular stream processing frameworks.
- Amazon Kinesis Video Streams:
- stream video from connected devices to AWS
- Amazon Kinesis Data Analytics:
- process data streams in real time with SQL or Java without having to learn new programming languages or processing frameworks.
- Amazon Kinesis Data Firehose:
- Amazon Kinesis
Data Lake
Data Warehouse
- Amazon Redshift (AWS Storage & Databases#Amazon Redshift)
Data Consuming/Visualization
- Amazon Athena
- query data in S3 using SQL
- Amazon QuickSight
- cloud-powered business intelligence (BI) service for data visualization
- Amazon Elasticsearch
- operational analytics
- Amazon Redshift
ML Modeling
Machine learning workflow
- Amazon SageMaker
- Amazon Augmented AI (Amazon A2I)
- for Machine Learning Systems Design#Human review of model predictions
- steps
- define human workforce -> define task UI -> define human review workflow
- -> start human loop with AWS AI Service API calls
- -> start human loop with custom ML models
- define human workforce -> define task UI -> define human review workflow